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Abstract — This paper considers receding horizon control of 
finite deterministic systems, which must satisfy a high level, 
rich specification expressed as a linear temporal logic formula. 
Under the assumption that time- varying rewards are associated 
with states of the system and they can be observed in real- 
time, the control objective is to maximize the collected reward 
while satisfying the high level task specification. In order to 
properly react to the changing rewards, a controller synthesis 
framework inspired by model predictive control is proposed, 
where the rewards are locally optimized at each time-step over 
a finite horizon, and the immediate optimal control is applied. 
By enforcing appropriate constraints, the infinite trajectory 
produced by the controller is guaranteed to satisfy the desired 
temporal logic formula. Simulation results demonstrate the 
effectiveness of the approach. 

I. Introduction 

This paper considers the problem of controlHng a de- 
terministic discrete-time system with a finite state-space, 
which is also referred to as a finite transition system. Such 
systems can be effectively used to capture behaviors of more 
complex dynamical systems, and as a result, greatly reduce 
the complexity of control design. 

A finite transition system can be constructed from a contin- 
uous system via an "abstraction" process. For example, for an 
autonomous robotic vehicle moving in an environment, the 
motion of the vehicle can be abstracted to a finite system 
through a partition of the environment. The set of states can 
be seen as a set of labels for the regions in the partition, 
and each transition corresponds to a controller driving the 
vehicle between two adjacent regions. By partitioning the 
environment into simplicial, rectangular or polyhedral re- 
gions, continuous feedback controllers that drive a robotic 
system from any point inside a region to a desired facet 
of an adjacent region have been developed for linear [1], 
multi-affine [2], piecewise-affine [3]-[5], and non-holonomic 
(unicycle) [6], [7] dynamical models. By relating the initial 
continuous dynamical system and the abstract discrete finite 
system with simulation or bisimulation relations [8], the 
abstraction process allows one to solve a control problem for 
the more complex continuous system with the "equivalent" 
abstract system. 

It has been proposed by several authors [1], [5], [9]- 
[11] to use temporal logics, such as linear temporal logic 
(LTL) and computation tree logic (CTL) [12], as specification 
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languages for finite transition systems due to their well 
defined syntax and semantics. These logics can be easily 
used to specify complex behavior, and in particular with LTL, 
persistent mission task such as "pick up items at the region 
pickup, and then drop them off at the region dropoff, 
infinitely often, while always avoiding unsafe regions". The 
applications of these temporal logics in computer science 
in the area of model checking [12] and temporal logic 
game [13] has resulted in off-the-shelf tools and algorithms 
that can be readily adapted to synthesize provably correct 
control strategies [1], [5], [9]-[ll]. 

While the works mentioned above address the temporal 
logic controller synthesis problem, several problems and 
questions remain to be answered. In particular, the problem 
of combining temporal logic controller synthesis with op- 
timality with respect to a suitable cost function remains to 
be solved. This problem becomes even more difficult if the 
optimization problem depends on time- varying parameters, 
e.g., dynamic events that occur during the operation of the 
plant. For traditional control problems (without temporal 
logic constraints) and dynamical systems, this problem can 
be effectively addressed using a model predictive control 
(MPC) approach (see e.g., [14]), which has reached a mature 
level in both academia and industry, with many successful 
implementations. The basic MPC set-up consists of the 
following sequence of steps: at each time instant, a cost 
function of the current state is optimized over a finite horizon, 
only the first element of the optimal finite sequence of 
controls is applied and the whole process is repeated at the 
next time instant for the new measured state. Thus, MPC is 
also referred to as receding horizon control. Since the finite 
horizon optimization problem is solved repeatedly at each 
time instant, real-time dynamical events can be effectively 
managed. 

However, it is not yet well-understood how to combine a 
receding horizon control approach with a provably correct 
control strategy satisfying a temporal logic formula. The 
aim of this paper is to address this issue for a specific 
system set-up (deterministic systems on a finite state-space) 
and problem formulation (dynamic optimization of rewards). 
More specifically, the role of the receding horizon controller 
is to maximize over a finite horizon the accumulated rewards 
associated with states of the system, under the assumption 
that the rewards change dynamically with time and they can 
only be observed in real-time. The rewards model dynamical 
events that can be triggered in real-time, which is an often 
used model in coverage control literature [15]. 

The key challenge in this controller synthesis framework 
is to ensure correctness of the produced infinite trajectory 



and recursive feasibility of the optimization problem solved 
at each time-step. For a constrained MPC optimization prob- 
lem, which is solved recursively on-line, feasible at all times 
or recursively feasible means that if the optimization problem 
is feasible (has a solution) for the initial state at initial time, 
then it remains feasible for all future time instants, when 
it will be solved with a different initial condition resulting 
from the generated closed-loop trajectory. A proof that the 
proposed receding horizon control framework satisfies both 
properties is provided. Similar to standard MPC, where cer- 
tain terminal constraints must be enforced in the optimization 
problem in order to guarantee certain properties for the 
system {e.g., stability), the correctness of produced trajectory 
and recursive feasibility are also ensured via a set of suitable 
constraints. 

This work can be seen as an extension and generalization 
of the set-up presented in [16], where a similar control 
objective was tackled. In [16] an optimization based con- 
troller was designed, which consists of repeatedly solving 
a finite horizon optimal control problem every N steps and 
implementing the complete sequence of control actions. This 
procedure is more close to finite-horizon optimal control 
than true receding horizon control and its main drawback 
comes from the inability of reacting to dynamical events 
{i.e., rewards) triggered or varying during the execution of 
the finite trajectory. This paper removes this limitation by 
attaining a truly receding horizon controller for deterministic 
systems on a finite state-space. Another related work is [5], 
where a provably correct control strategy was obtained for 
large scale systems by dividing the control synthesis problem 
into smaller sub-problems in a receding horizon like manner. 
However, in [5] dynamical events were addressed differently 
and the specification language was restricted to a fragment of 
LTL, whereas in this paper full LTL expressivity is allowed. 

II. Problem Formulation and Approach 

In this paper, we consider a discrete-time system with 
a finite state space, i.e., the system evolves on a graph. 
Each vertex of the graph produces an output, which is a 
set of observations. Such a system can be described by a 
finite deterministic transition system, which can be formally 
defined as follows. 

Definition II. 1 (Finite Deterministic Transition System). A 
finite (weighted) deterministic transition system (DTS) is a 
tuple T = (Q, <7o7 A, ^5 n, h), where 

• Q is a finite set of states; 

• qo G Q is the initial state; 

• A C Q X Q is the set of transitions; 

• cj : A ^ R+ is a weight function that assigns positive 
values to all transitions; 

• H is a set of observations; and 

• h : Q ^ 2^ is the observation map. 

For convenience of notation, we denote q -^j- q' if (g, q') G 
A. We assume T to be non-blocking, i.e., for each q ^ Q, 
there exists q' ^ Q such that q -^j- q' (such a system is 
also called a Kripke structure [17]). A trajectory of a DTS 
is an infinite sequence q = qoqi... where qk — ^r Qk-\-i for 



all k > {). A trajectory q generates an output trajectory 
o = oqOi..., where Ok = h{qk) for all k > 0. 

Note the absence of the control inputs in the definition of 
T. This is because T is deterministic, and one can choose an 
available transitions at a state. In other words, each transition 
{q^q') corresponds to a unique control input at state q. This 
also implies that a trajectory q = qoqi . . . can be used as 
a control strategy for T, by simply applying the transitions 
(^0, Qi)^{Qi^Q2), and so on. An example of a DTS is shown 
in Fig. [T] 
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Fig. L An example of a finite DTS T as defined in Def. |IL 1 1 In this 
example, T has 100 states, which are at the vertices of a rectangular grid 
with cell size 10. We define the weight function uu to be the Euclidean 
distance between vertices, and there is a transition between two vertices if 
the Euclidean distance between them is less than 15. The set of observations 
is n = {base, survey, recharge, unsafe}. States with no observation 
are shown with smaller vertices. 



The goal of this paper is to synthesize trajectories q of 
T satisfying a behavioral specification, given as a linear 
temporal logic formula over 11. An LTL formula over 11 is 
interpreted over an (infinite) sequence o = oqOi . . ., where 
o/e C n for all /c > 0. We say q satisfies an LTL formula (j) 
if it generates an output trajectory o satisfying (j). A detailed 
description of the syntax and semantics of LTL is beyond the 
scope of this paper and can be found in [12]. Roughly, an 
LTL formula is build up from the observations in 11, Boolean 
operators -■ (negation), V (disjunction), A (conjunction), — > 
(implication), and temporal operators X (next), U (until), 
F (eventually), G (always). For example, the following task 
command in natural language: ''Reach a survey location in- 
finitely often, and always avoid unsafe states" can be trans- 
lated to the LTL formula: ^ := G F survey A G -lunsaf e. 

The system is assumed to operate in an environment with 
dynamical events. In this paper, these events are modelled 
by a reward process 7^ : Q x N ^ R+, i.e., the reward 
associated with state q e Q at time k is 1Z{q, k). Note that 
rewards are associated with states in Q in a time varying 
fashion. We do not make any assumptions on the dynamics 
governing the rewards, but we make the natural assumption 
that, at time /c, the system can only observe the rewards in 
a neighborhood J\f{q^k) C Q of the current state q. In this 
paper, we assume that the reward process IZ is unknown and 



reward values must be observed and acted upon in real-time. 
The problem in the case when knowledge of IZ is given a- 
priori is interesting, but will be addressed in future research. 

The problem considered in this paper is formally stated 
next. 

Problem II.2. Given a transition system T and an LTL 
formula (j) over the set of observations of T, design a 
controller that maximizes the collected reward locally, while 
it ensures that the produced infinite trajectory satisfies (j). 

Since the rewards are time-varying and can only be ob- 
served around the current state, inspirations from the area of 
MPC are drawn (see, e.g. [14]) with the aim of synthesizing a 
controller such that the rewards are maximized in a receding 
horizon fashion. At time k with state qk, the controller 
generates a finite trajectory qk-\-iqk-\-2 - - - Qk-\-N by solving 
an on-line optimization problem maximizing the collected 
rewards over a horizon N, and the system implements the 
immediate control action {qk^qu+i)- This process is then 
repeated at time /c + 1 and state qu+i- 

In order to guarantee the satisfaction condition for the 
LTL formula (/), the proposed approach is based on the 
construction of an automaton that captures all satisfying 
trajectories of T. This automaton also induces a Lyapunov- 
like function that can be used to enforce that the trajectory 
of the system satisfies the desired formula. These steps are 



formally described in detail in Sec. Ill The aforementioned 
function will be utilized to guarantee recursive feasibility of 
the developed receding horizon controller, which in turn will 
yield that the synthesized infinite trajectory satisfies (j). The 



controller synthesis method is presented in Sec. IV 



III. A TOOL FOR ENFORCING 
THE BUCHI ACCEPTANCE CONDITION 

In this section, we review the definition of Buchi automata 
and describe the construction of a function that enforces 
the satisfaction of a Biichi acceptance condition for the 
trajectories of a DTS. 

Definition III.l (Biichi Automaton). A (nondeterministic) 
Biichi automaton is a tuple B = {Sb, Sbo, S, ^, Fb), where 

• Sb is a finite set of states; 

• Sbo ^ Sb is the set of initial states; 

• S is the input alphabet; 

• S : Sb X ^ ^ "^^^ is the transition function; 

• Fb ^ S is the set of accepting states. 

We denote s -^b s' if s' G S{s^ a). An infinite sequence 
cfqcfi . . . over S generates trajectories sqSi . . . where sq G 
Sbq cind Sk -^B Sk-\-i for all k > 0. B accepts an infinite 
sequence over S if it generates at least one trajectory on B, 
which intersects the set Fb infinitely many times. 

For any LTL formula (j) over 11, one can construct a Biichi 
automaton with input alphabet S = 2^ accepting all and only 
sequences over 2^ that satisfy (j) [12]. We refer readers to 
[18] for efficient algorithms and implementations to translate 
an LTL formula over 11 to a corresponding Biichi automaton 
B. 



Definition III.2 (Weighted Product Automaton). Given a 
weighted DTS T = (Q? <70 7 A, ^^ n, /i) and a Biichi automa- 
ton B = {Sb, Sbo, 2^, ^23, Fb), their product automaton, de- 
noted by V = T X B, is a tuple V = {S-p^ Spo, A-p, cj-p, F-p) 
where 

• S-p = Q X Sb; 

• S-pQ = {<7o} X Sbq; 

• Ap C Sp X Sp is the set of transitions, defined by: 



{{q,s),{q\s')) e Ap iff q ^r q' ^nd s — f^ 5'; 

• up : Ap -^ R+ is the weight function defined by: 
ujp{{q,s),{q',s')) =uj{{q,q')) 

. Fp = QxFb. 
We denote {q,s) -^p {q',s') if {{q,s),{q' ,s')) G Ap. A 
trajectory p = (go, 5o)(<7i, si) . . . ofV is an infinite sequence 
such that {qo.so) G Spo and {qk.Sk) -^p {qk+i,Sk+i) for 
all k > 0. Trajectory p is called accepting if and only if it 
intersects Fp infinitely many times. 

We define the projection 77- of p onto T as simply 
removing the automaton states, i.e.. 



7r(p) = q = QoQi . . . , if p = (go, so){qi,Si) 



(1) 



We also use the projection operator 77- for finite trajectories 
(subsequences of p). Note that a trajectory p on 7^ is uniquely 
projected to a trajectory 7r(p) on T. By the construction of 
V from T and B, p is accepted if and only if q = 7r(p) 
satisfies the LTL formula corresponding to B [12]. 

In [16], we introduced a real positive function V on the 
states of the product automaton V that uses the weights 
Up to enforce the acceptance condition of the automaton. 
Conceptually, this function resembles a Lyapunov, or energy 
function. While in Lyapunov theory energy functions are 
used to enforce that the trajectories of a dynamical system 
converge to an equilibrium, this "energy" function enforces 
that the trajectories of T satisfy the acceptance condition of 
a Biichi automaton. 

To define the energy function, we first denote a set A C 
Sp to be self- reachable if and only if all states in A can 
reach at least one state in A. 

Definition III.3 (Energy function of a state in V). We define 
Fp to be the largest self- reachable subset of Fp. The energy 
function V{p), p G Sp is defined as the graph distance of 
p to the set F^, i.e., the accumulated weight of the shortest 
path from p to any states in F^. 

Fig. [2] shows an example of T, B, and their product V, as 
well as the induced energy function defined on states of V. 
In [16], we showed the following properties for V. 
Theorem III.4 (Properties of the energy function). V satis- 
fies the following: 
(i) If a trajectory p on V is accepting, then it cannot 

contain a state p where V{p) = 00. 
(ii) All accepting states in an accepting trajectory p are in 

the set F^ and have energy equal to 0; all accepting 

states that are not in F^ have energy equal to oc. 
(Hi) For each state p G Sp, if V{p) > and V{p) 7^ 00, 

then there exists a state p' where p -^p p' such that 

V{p') < V{p). 
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Fig. 2. The construction of the product automaton and the energy function on its states. In this example, the set of observations is 11 = {a, b}. The initial 
states are indicated by incoming arrows. The accepting states are marked by double-strokes, (a): A weighted DTS T. The label atop each state indicates the 
set of associated observations, (i.e., {a, b} means both a and b are observed). The labels on the transitions indicate the weights, (b): The Biichi auto maton 
B corresponds to LTL formula G (F (a A F 6)), translated by the tool LTL2BA [18]. (c): The product automaton V = T x B constructed with Def. |III.2| 
(the weights are inherited from T and not shown). The number above a state p G S-p is the energy function V{p). Note that in this example, the set 
Fp = F-p , thus V(p) is the graph distance from p to any accepting states. 



We see from the above facts that V{p) resembles an 
energy-Hke function, which justifies the name we use. We 
refer to the value of V{p) at a state p e S-p sls the "energy of 
the state". Note that satisfying the LTL formula is equivalent 
to reaching states where V{p) =0 for infinitely many times. 
Therefore, for each state p e Sp,V{p) provides a measure of 
progress towards satisfying the LTL formula. An algorithm 
generating V{p) for an arbitrary product automaton can be 
found in [16]. 

IV. Main results: 
Receding Horizon Controller Design 



In this section, we present a solution to Prob. |II.2[ The 
central component of our control design is a state-feedback 
controller operating on the product automaton that optimizes 
finite trajectories over a pre-determined, fixed horizon N, 
subject to certain constraints. These constraints ensure that 
the energy of states on the product automaton decreases in 
finite time, thus guaranteeing that progress is made towards 
the satisfaction of the LTL formula. Note that the proposed 
controller does not enforce the energy to decrease at each 
time- step, but rather that it eventually decreases. The finite 
trajectory returned by the receding horizon controller is 
projected onto T, the controller applies the first transition, 
and this process is repeated again at the next time-step. 

In this section, we first describe the receding horizon 
controller and show that it is feasible (a solution exists) 
at all time- steps k e N. Then, we present the general 
control algorithm and show that it always produces (infinite) 
trajectories satisfying the given LTL formula. 

A. Receding horizon controller 

In order to explain the working principle of the controller, 
we first define a finite predicted trajectory on V at time 
k. Denote the current state at time k as p^. A predicted 
trajectory of horizon N at time /c is a finite sequence p;. := 

Pi|/c---PAr|/c, where p^|/e G Sv for alH = 1, . . . , A/', p^|/e ^-p 
Pi+i|/eforalH = l,...N-l^ndpk ^^ pi|/e. Here,Pi|/, is a 



notation used frequently in MPC, which denotes the ith state 
of the predicted trajectory at time k. Moreover, we denote the 
set P(p/c, N) as the set of all finite trajectories of horizon N 
from a state pk ^ S-p. Note that the finite predicted trajectory 
P;. of V uniquely projects to a finite trajectory q^ := 7r(P/c) 

of r. 

For the current state qk at time /c, we denote the observed 
reward at any state q G Q as Rk{q), and we have that 



Rk{q) 



n{q,k) if qeN{qk,k) 







otherwise. 



(2) 



Note that Tl{q^ k) = if q ^ -^{Qk^ k) because the rewards 
outside of the neighbourhood cannot be observed. We can 
now define the predicted reward associated with a predicted 
trajectory p^. G F{pkjN) at time k. The predicted reward of 
P;., denoted as ^k{Vk)^ i^ simply the amount of accumulated 
rewards by 7r(Pfe) ^f ^• 



N 



^k{Pk) = '^Rk (7rfe|fc)) 



(3) 



The receding horizon controller executed at the initial state 
at time k = Ois described next. This is a special case because 
the initial state of V is not unique, and as a result we can 
pick any initial state of V from the set Spo = {go} x Sbo- We 
denote the controller executed at the initial state as RH^(6'po), 
and we define it as follows 



P5 



RH^(57-o) 



argmax %(Po)- 

Po^{P(Po,iV)|Vbo)<oo} 



(4) 



The controller maximizes the predicted cumulative re- 
wards over all possible projected trajectories over horizon A^ 
initiated from a state po G Spo where the energy is finite, and 
returns the optimal projected trajectory Pq. The requirement 
that V{po) < oo is critical because otherwise, the trajectory 
starting from po cannot be accepting. If there does not exist 



Po such that V{po) < oo, then an accepting trajectory does 
not exist and there is no trajectory of T satisfying the LTL 
formula (i.e., Prob. |II.2] has no solution). 
Lemma IV. 1 (Feasiblity of (|4])). Optimization problem ^ 
always has at least one solution if there exists po such that 
V{po) < oo. 

Proof. The proof follows from the fact that T is non- 
blocking, and thus the set P(po, ^) is not empty. ■ 

Next, the receding horizon control algorithm for any time 
instant k = 1,2,... and corresponding state pk G Sj> is 
presented. This controller is of the form 



pl=RE{pk,vl_i) 



(5) 



i.e., it depends both on the current state pk and the optimal 
predicted trajectory p^_i = Pi\k-i • • -PNlk-i obtained at 
the previous time- step. Note that, by the nature of a receding 
horizon control scheme, the first control of the previous 
predicted trajectory is always applied. Therefore, we have 
the following equality 



Pk=Pi\k-i 



k = 1,2,.... 



(6) 



As it will become clear in the text below, p^_i is used to 

enforce repeated executions of this controller to eventually 

reduce the energy of the state on 7^ to 0. 

We define controller ^ with the following three cases: 
1) Case 1. V{pk) > and ^(p^i/c-i) 7^ fa^ ^^^ ^ — 

1 , . . . , A^.' In this case, the receding horizon controller is 

defined as follows. 

Pfc = RH(p/e,P^_i) 

:= argmax 3?/c(P/e), 

subject to: V{pN\k) < V(p^|fe_i). (7) 

The key to guarantee that the energy of the states on V 
eventually decreases is the terminal constraint V{pN\k) < 
y{p^N\k-i)' ^•^•' ^^^ optimal finite predicted trajectory p\ 
must end at a state with lower energy than that of the 
previous predicted trajectory p\_i. This terminal constraint 
mechanism is graphically illustrated in Fig. [3] 



V . - 
Time k-1 | 1 

Pl\k-1 P2\k-1 

V • 





Time l< ' ' ^ * * * 

P*l\k P*2\k P*N\k Pl\k P2\k P^-l\k PN\k 

Fig. 3. Constraints enforced for the receding horizon control law pj = 

RH(pfc,pJ_-,^) for Cases 1 and 2. 

To verify the feasibility of the optimization problem under 
this constraint, we make use of the third property of V in 
Thm. III.4[ Namely, each state with positive finite energy can 
make a transition to a state with strictly lower energy. 



Lemma IV.2 (Feasibility of ([7])). Optimization problem ^ 
always has at least one solution ifV{pk) < oo. 

Proof Given p^.-^ = pl^j^_^ . . .p^^^ik-v since pk = pt|fe-i' 
we have pk -^v P2\k-i- Therefore, we can construct a 
finite predicted trajectory p^. = pi\k • "PN\k wh ere Pi \k = 
PUi\k-i fo^ alU = 1,..., 



III.4 



(iii), 



,N - 1. Using Thm. 
there exists a state p where PN-i\k -^v P such that V [p) < 
V{pN-i\k)- Setting pn\j, = p, the finite trajectory p^. = 
Pi\k • • -PNik ^ P(P/c7^) satisfies the constraint V{pN\k) < 
^(Pati/c-i)' ^nd therefore ^ has at least one solution. ■ 

2) Case 2. V{pk) > and there exists i G {1, . . . ^N} 
with ^(p^i/c-i) = 0-' W^ denote i^(p^_i) as the index 
of the first occurrence in p^_i where the energy is 0, 
i.e., V{p'^o(^^ )\k-i) ~ ^' ^^ ^^^^ propose the following 
controller. 



Pfe = RH(pfe,Pfe-i) 

:= argmax 5Rfe(P/c)^ 
PfceP(pfc,iV) 

subject to: V(pJ^o(p*_^)_i|fc) = 0. 



(8) 



Namely, this controller enforces a state in the optimal 
predicted trajectory to have energy if the previous predicted 
trajectory contains such a state. This constraint is illustrated 
in Fig. Is] Note that, if i^(j^l_i) = 1, then from ([6]), the 
current state pk is such that V{pk) = 0, and Case 2 does not 
apply but Case 3 (described below) applies instead. 
Lemma IV.3 (Feasibility of ([8])). Optimization problem ^ 
always has at least one solution ifV{pk) < oo. 

Proof Given p^.-^ = pl^f^_^ . . .p^^^ik-v since pk = pt|fe-i' 
we have pk -^v P2\k-v Therefore, we can construct a 
finite predicted trajectory p;. = pi\k • "PN\k where Pi\k = 
P^+i|/e_i for all z = 1,...,7V — 1. If we let p^^^ to be 
any state where PN-i\k ~^v PN\k ^nd V{pn\j.) < oo, then 
P/c = Pi\k ' ' 'PN\k ^ ^{Pkj N) satisfies the constraint. Thm. 
III.4 (iii) gurantees that such a state P7v|fe exists. ■ 



3) Case 3, V{pk) = 0.- In this case, the terminal con- 
straint is that energy value of the terminal state is finite. The 
controller is defined as follows. 

Pfc = RH(p/e,Pfc-l) 

:= argmax 3?/c(P/e). 

subject to: V{pN\k) < oo. (9) 

Lemma IV.4 (Feasiblity of ([9])). Optimization problem ^ 
always has at least one solution. 

Proof. If V{pk) = 0, then there exists pi\j^ such that pk -^v 
Pi\j^ and V{pi\k) < oo (if not, then V{pk) must equal to oo). 
From Thm. III.4 (iii), we have that there exists p2\k such that 
Pi\k -^v P2\k and V{p2\k) < V{pi\k) < oo. By induction, 
there exists p^. G F{pk,N) such that V{pN\k) < oo. ■ 



Remark IV.5. The proposed receding horizon control law 
is designed using an extension of the terminal constraint 



(5'h,*S'ho,2 ,6b,Fb) 



approach in model predictive control [14] to finite determin- 
istic systems. The particular setting of the Buchi acceptance 
condition, combined with the energy function V, makes 
it possible to obtain a non-conservative analogy of the 
terminal constraint approach, via either a terminal inequality 
condition ^ or a terminal equality condition ([5]). 

B. Control algorithm and its correctness 

The overall control strategy for the transition system T is 
given in Alg.[T] After the off-line computation of the product 
automaton and the energy function, the algorithm applies 
the receding horizon controller R}{^{Spo) at time k = 0, 
or RH(p/e,p^_^)) at time k > 0. At each iteration of the 
algorithm, the receding horizon controller returns the optimal 
predicted trajectory p^. The immediate transition (pk^Piij.) is 
applied on V and the corresponding transition (^fe, 7r(Pi|/e)) 
is applied on T. This process is then repeated at time /c + 1. 

Algorithm 1 Receding horizon control algorithm for T = 

(Q, ^0, A, cj, n, h), given an LTL formula (j) over 11 

Executed Off-line: 

1: Construct a Biichi automaton B 

corresponding to 0. 
2: Construct the product automaton V = T x B — 

(Sr^Sro, Ap,a;p, Fr). Find V{p) for all p e Sr [16]. 

Executed On-line: 

1: if there exists po G S-po such that V{po) ^ oo then 
Set k = 0. 

Observe rewards for all q G J\f{qQ^k) and obtain 
i^o(^). 

Obtain p$ = Y{R^{Svq)- 
Implement transition (po,Pi|o) ^^ ^ ^^^ transition 

(^o,7r(Pi|o)) onT. 

Set /c = 1 

loop 

Observe rewards for all q G M{qk^k) and obtain 

Rk{q)- 

Obtain p^ = RH(p/e,p^_i). 

Implement transition {pk^p\\}^) on 7^ and transition 

(g/c,7r(pt|fe)) on'7^- 
Set A: ^ A: + 1 

end loop 

else 

There is no run originating from q^ that satisfies (j). 
end if 



oo if and only if V{p) < oo. Since RH^(6'po) is feasible, 
we have pi = p\\q and thus V{pi) < oo. At each time 



9: 

10: 



First, we show that the receding horizon controllers used 
in Alg. [T] are always feasible. We use a recursive argument, 
which shows that if the problem is feasible for the initial 
state, or at time /c = 0, then it remains feasible for all future 

time-steps /c = 1,2, 

Theorem IV.6 (Recursive Feasiblity). If there exists po G 
S-po such that V{po) ^ oo, then RH^(6'-po) is feasible and 
RE{pk^pl._i)) is feasible for a// /c = 1, 2, . . .. 



we 



A: > 0, if V{pk) < oo, from Lemmas [Ivll [!Y3] and [JVi 
have that controller RH(p/c,p^_^) is feasible. Since Pk-\-i = 
Pi\k' ^^ ^^^^ V{pk-\-i) < OO. Using induction we have that 
RH(p/e,p^_^) is feasible for all /c = 1, 2, ■ 

Finally, we show that Alg. [T] always produces an infinite 
trajectory satisfying the given LTL formula (j), giving a 
solution to Prob. III.2I 

Theorem IV.7 (Correctness of Alg. [T]). Assume that there 
exists a satisfying run originating from qo for a transition 
system T and an LTL formula (j). Then, Alg. ^produces an 
(infinite) trajectory q = qoqi . . . satisfying (j). 

Proof. If there exits a satisfying run originating from go, 
then there exists a state po ^ Spo such that V{po) < oo. 



Proof. From Lemma IV. 1 RH^(6'7[?o) is feasible. From the 



definition of V{p), for all p ^ Sp,ifp -^j> p' , then V{p') < 



Therefore, from Thm. IV.6 the receding horizon controller 
is feasible for all /c > 0, and Alg. [T] will always produce an 
infinite trajectory q. 

At each state pk at time A: > 0, if F(p/c) > 0, then either 
Case 1 or Case 2 of the controller RH(p/e) applies. If Case 1 
applies, since V{pl^^) > V{pl^^^j^ > V(p^^2|Ar) • • " there 
exists j > k such that V{p*-^j^) = 0. This is because the 
state-space Sj> is finite, and therefore, there is only a finite 
number of possible values for the energy function V{p). At 
time j. Case 2 of the proposed controller becomes active until 
time I = j -\- i^{p^), where V{pi) = 0. Therefore, for each 
time k, if V{pk) > 0, there exists / > k such that V{pi) = 
by repeatedly applying the receding horizon controller. If 
V{pk) = 0, then Case 3 of the proposed controller applies, 
in which case either V{pk-\-i) = or V{pk-\-i) > 0. In either 
case, using the previous argument, there exists j > k where 
V{pj) = 0. 

Therefore, at any time k, there exists j > k where 
V{pj) = 0. Furthermore, since j is finite, we can conclude 
that the number of times where V{pk) = is infinite. By 
the definition of V{p)^p e S-p, Vk = is equivalent to that 
P/c G Fp C Fp. Therefore, the trajectory p is accepting. 
The trajectory produced on T is exactly the projection q = 
77" (p), and thus, it can be concluded that q satisfies 0, which 
completes the proof. ■ 

C. Discussions 

It is possible to extend the optimization problem of 
maximizing rewards to other meaningful cost functions. 
For example, it is possible to assign penalties or costs on 
states of the system and minimize the accumulated cost of 
trajectories in the horizon. It is also possible to define costs 
on state transitions and minimize the control effort (or the 
combination of this cost function with the one above). 

The complexity of the off-line portion of Alg. [T] depends 
on the size of V. Denoting \S\ as the cardinality of a 
set S, from [18], a Biichi automaton translated from an 
LTL formula over 11 contains at most |n| x 2 1^1 state^ 

^In practice, this upper limit is almost never reached (see [1]). 



Therefore, the size of S-p is bounded by|(5|x|n|x2l^l. 
From [16], the complexity of generating the energy function 
i^ 0{\Sp\^ ^{F-pl^). The complexity of the on-line portion 
of Alg. [T] is highly dependent on the horizon TV. If the 
maximal number of transitions at each state of V is A^^, 
then the complexity at each iteration of the receding horizon 
controller is bounded by (A^^)^, assuming a depth first 
search algorithm is used to find the optimal trajectory. It 
may be possible to reduce this complexity from exponential 
to polynomial if one applies a more efficient graph search 
algorithm using Dynamic Programming. This will be studied 
in future research. 

V. Software Implementation and Case Study 

The control framework presented in this paper was im- 
plemented in a user friendly software package, available on 
Ihttp: //hyness .bu.edu/LTL_MPC.html, To utilize 
this software, a user needs to input the finite transition 
system T, an LTL formula (/), the horizon N, and a function 
'R{q^k) that generates the time- varying rewards defined on 
the states of T. The software executes the control algorithm 
outlined in Alg. [T] and produces a trajectory in T that 
satisfies (j) and maximizes the rewards collected locally with 
the proposed receding horizon control laws. This software 
uses the LTL2BA [18] tool for the translation of an LTL 
formula to a Btichi automaton. 

We now present a case study applying the software 
package. In this case study, we use the transition system 
defined as vertices of a rectangular grid as shown in Fig. [T] 
We consider the following LTL formula, which expresses a 
robotic surveillance task: 

(j) := GFbase 

AG (base — > X-ibase U survey) 

AG (survey — > X-isurvey U recharge) 

AG ^unsafe. (10) 

The first line of 0, GFbase, enforces that the state 
with observation base is repeatedly visited (possibly for 
uploading data). The second line ensures that after base 
is reached, the system is driven to a state with observation 
survey, before going back to base. Similarly, the third line 
ensures that after reaching survey, the system is driven to 
a state with observation recharge, before going back to 
survey. The last line ensures that, at any time, the states 
with observation unsafe should be avoided. 

We assume that at each state g' G Q, the rewards at state 
q' can be observed if the Euclidean distance between q and 
q' is less than or equal to 25. In this case study, we define 
7^(g, k) as follows. At time /c = 0, the reward value lZ{q^ 0) 
at each state q is generated randomly by a uniform sampling 
in the range of [10,25]. At each subsequent time A: > 0, if 
the reward value at a state is positive, then it decays with a 
specific rate. Otherwise, there is a probability that a reward 
is assigned to this state with a value chosen by a uniform 
sampling in the range of [10, 25]. In this case study, the states 
with rewards can be seen as "targets", and the reward values 



can be seen as the "amount of interest" associated with each 
target. The control objective of maximizing the collected 
rewards can be interpreted as maximizing the information 
gathered from surveying states with high interest. 
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Fig. 4. Snapshots of the system trajectory under the proposed receding 
horizon control laws. In all snapshots, the state with rewards are marked in 
green, where the size of the state is proportional with the associated reward. 
a) At time fc = 0, the initial state of the system is marked in red (in the 
lower left corner), b) The controller pj = RH'-'(S'-po) is computed at the 
initial state. The optimal predicted trajectory pj is marked by a sequence 
of states in brown, c) The first transition go — ^T Qi is applied on T and 
transition po — )-p pi is applied on V. The current state (qi) of the system 
is marked in red. d) The controller p^ = RH(pi,pJ) is computed at pi. 
The optimal predicted trajectory p^ is marked by a sequence of states in 
brown. 

By applying the method described in the paper, our 
software package first translates (j) to a. Biichi automaton 
B, which has 12 states. This procedure took 0.5 second on 
a Macbook Pro with a 2.2GHz Quad-core CPU. Since T 
contains 100 states, \Sp\ is 1200. The generation of the 
product automaton V and the computation of the energy 
function V took 4 seconds. In this case study, we chose 
the horizon N to be 4. By applying Alg. [llsome snapshots 
of the system trajectory are shown in FigT^ Each iteration 
of Alg. [T] took 1 — 3 seconds (due to different numbers of 
graph searches needed, the computation time varies for each 
iteration). 

We applies the control algorithm for 100 time-steps. We 
plotted the results after 100 time-steps in Fig. [5] At the top, 
we plot the energy V{p) at the each time- step. We see that 
after 55 time-steps, the energy is 0, meaning that an accepting 
state is reached. Note that, each time an accepting state is 
reached, the system visits the base, survey and recharge 
states at least once i.e., one cycle of the surveillance mission 



task (base - survey - recharge) is completed. We also 
compare the receding horizon controller with the controller 
proposed in [16] at the bottom of Fig. [5] We clearly see 
that the receding horizon controller proposed in this paper 
performs better in terms of rewards collection, since it reacts 
much quicker to the time varying rewards. An example video 
of the evolution of the system trajectory is also available at 
Ihttp : //hyness .bu . edu/LTL_MPC . html] 




Fig. 5. Upper figure: plot of energy V{p) at the current state for 100 
time-steps. Bottom figure: in blue, plot of the cumulative rewards collected 
in 100 time-steps by the proposed receding horizon controller; in red, plot 
of the cumulative rewards collected by the controller in [16] using the same 
reward function 7l(q,k). 



VI. Conclusion and final remarks 

In this paper, a receding horizon control framework that 
optimizes the trajectory of a finite deterministic system 
locally, while guaranteeing that the infinite trajectory satisfies 
a given linear temporal logic formula, was proposed. The 
optimization criterion was defined as maximization of time- 
varying rewards associated with the states of the system. 
A control strategy that makes real-time control decisions in 
terms of maximizing the reward while ensuring satisfaction 
of the LTL specification was developed. The proposed frame- 
work is a step toward synergy of model predictive control 
and formal controller synthesis, which is beneficial for both 
areas. 

Future research deals with the extension of the proposed 
framework to finite probabilistic systems, such as Markov 
decision processes or partially observed Markov decision 
processes, where the specifications are given as formulas of 
probabilistic temporal logic. 
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